Multicollinearity applied stepwise stochastic imputation: a large dataset imputation through correlation-based regression

نویسندگان

چکیده

Abstract This paper presents a stochastic imputation approach for large datasets using correlation selection methodology when preferred commercial packages struggle to iterate due numerical problems. A variable range-based guard rail modification is proposed that benefits the convergence rate of data elements while simultaneously providing increased confidence in plausibility imputations. country conflict dataset motivates search impute missing values well over common threshold 20% missingness. The Multicollinearity Applied Stepwise Stochastic (MASS-impute) capitalizes on between variables within and uses model residuals estimate unknown values. Examination provides insight toward choosing linear or nonlinear modeling terms. Tailorable tolerances exploit residual information fit each element. evaluation includes observing computation time, fit, comparison known replaced created through imputation. Overall, useable defendable results imputing dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EM-based stepwise regression imputation using standard and robust methods

Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim of this contribution is to propose an automatic algorithm called IRMI...

متن کامل

Fractional imputation using regression imputation model

Consider a finite population of N elements identified by a set of indices U = {1, 2, ..., N}. Associated with each unit i in the population there is a study variable yi and a vector xi of auxiliary variables. Let A denote the set of indices for the elements in a sample selected by a set of probability rules called the sampling mechanism. Let the population quantity of interest be θN = ∑N i=1 yi...

متن کامل

Iterative stepwise regression imputation using standard and robust methods

Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from official statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of different variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-...

متن کامل

Imputation via Triangular Regression-Based Hot Deck

In principle, hot deck imputation methods preserve means and variances, and can also preserve covariances with other vari­ ables included in the allocation matrix. In practice, dimension­ ality problems arise quickly as predictive variables are added and allocation matrix cells become small, undermining the hot deck’s theoretical advantages. Predictive­mean nearest­ neighbor imputation avoids d...

متن کامل

Regression Fractional Hot Deck Imputation

Imputation using a regression model is a method to preserve the correlation among variables and to provide imputed point estimators. We discuss the implementation of regression imputation using fractional imputation. By a suitable choice of fractional weights, the fractional regression imputation can take the form of hot deck fractional imputation, thus no artificial values are constructed afte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Big Data

سال: 2023

ISSN: ['2196-1115']

DOI: https://doi.org/10.1186/s40537-023-00698-4